Ecologic, Geoclimatic, and Genomic Factors Modulating Plague Epidemics in Primary Natural Focus, Brazil

Plague is a deadly zoonosis that still poses a threat in many regions of the world. We combined epidemiologic, host, and vector surveillance data collected during 1961–1980 from the Araripe Plateau focus in northeastern Brazil with ecologic, geoclimatic, and Yersinia pestis genomic information to elucidate how these factors interplay in plague activity. We identified well-delimited plague hotspots showing elevated plague risk in low-altitude areas near the foothills of the plateau’s concave sectors. Those locations exhibited distinct precipitation and vegetation coverage patterns compared with the surrounding areas. We noted a seasonal effect on plague activity, and human cases linearly correlated with precipitation and rodent and flea Y. pestis positivity rates. Genomic characterization of Y. pestis strains revealed a foundational strain capable of evolving into distinct genetic variants, each linked to temporally and spatially constrained plague outbreaks. These data could identify risk areas and improve surveillance in other plague foci within the Caatinga biome.

https://www.apac.pe.gov.br).The yearly averages were calculated as the average of monthly averages from the given year.Monthly average was determined by the average values of the measured water volume from all the meteorological stations available in the region.For the seasonal analysis, we calculated the average values for each month considering the study timeframe (1961)(1962)(1963)(1964)(1965)(1966)(1967)(1968)(1969)(1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980).The list of all meteorological stations used in the study is provided in Appendix 2 Table 3. Importantly, the pluviosity-related analyses are limited to the portion of the Araripe plateau located in the Pernambuco state because there was poor data availability for Ceará and Pauí states during the study timeframe.

Laboratory Testing for Plague
The captured rodents were maintained in quarantine cages and were observed on a daily basis.The individuals that died during quarantine were immediately tested for plague and the survivors were tested 4 weeks after capturing.The laboratorial diagnosis in rodents during the PPP program was performed through Y. pestis detection by direct microscopy observation of spleen imprints and blood smears, followed by conventional bacteriology as previously described (1).Suspected colonies were confirmed with the bacteriophage test (2).The fleas were also tested in bacterial culturing.The diagnosis in humans was performed through the combination of clinical and epidemiologic assessment and laboratory testing (bubo aspirates and blood cultures).

Geoprocessing
Study area: The geographic scope encompasses the plateau, characterized by an expansive summit plateau situated at an elevation of 950 m above sea level.The plateau is surrounded by scalloped escarpments undergoing noticeable erosive retreat, featuring a topographical disparity of 250 m in the extreme west (Araripina county) and 500 m in the east (Exu and Moreilândia counties).
For the production of the georeferenced maps, coordinates were collected with on-site visits.Locations of interest were georeferenced using the global positioning system (GPS) technology, using a GPS receiver model eTrex Vista Cx, Garmin (Kansas City, USA), UTM projection, Datum SIRGAS 2000, with a minimum accuracy of 10 m.The Cartographic Base (limit of municipalities, districts, census tracts, subnormal agglomerations), data on the estimated population and hydrography information were obtained from the Brazilian Institute of Geography and Statistics (IBGE -http://www.ibge.gov.br).
where human cases of plague occurred, the radius was calculated from the matrix of distance of the points and the mean and standard deviation (SD) values, applied to the following equation: R = x̄±x̄σ.The calculated value was weighted by the number of cases per location.Incidence map was calculated as follows: the total number of plague cases in the period from 1961 to 1980 per municipality divided by the population in the middle of the period (1970), adjusted for 100,000 inhabitants/year.The space-time scan statistics (https://www.satscan.org)used the discrete Poisson model.To increase the granularity of the analysis, the municipalities were subdivided into smaller polygons, called census tracts (n = 268).The reference population size was that of the 2000 census (first year in which the population was recorded at a census tracts level).For the construction of pluviosity maps, the rainfall data described above was interpolated using the Inverse Weighted Distance -IDW method, in the linear interpolation and quartile mode.
The Normalized Difference Vegetation Index (NDVI) map was constructed using the mosaic of satellite images that represents the extension of the territory of Araripe Plateau.The formula NDVI = (NIR -red) / (NIR + red) was used, where NIR is near infrared and red: visible red light spectrum.LandSat4 images were downloaded from the GloVis-USGS database (https://earthexplorer.usgs.gov).The hydrography was obtained from the Mineral Resources Research Company (CPRM) (https://www.cprm.gov.br/en/Hydrology-83) and the Digital Elevation Model (DEM) data was obtained from the Shuttle Radar Topography Mission (SRTM) refined for the Brazilian territory from the original resolution of 3 arc seconds to 1 arc second using a geostatistical approach (http://www.dsr.inpe.br/topodata).

Ecologic Networks
We constructed Y. pestis potential transmission networks based on interactions between small mammal species (reservoirs) and the fleas (vectors) captured during plague surveillance activities between 1966-1980 in the Araripe plateau.A network was structured for each year, with nodes representing small mammals and fleas species, and links denoted the relative proportion of fleas species per small mammals species, resulting in 15 networks.
We analyzed the fundamental properties of each host-vector plague network using the bipartite R package (3).The structural properties of each network were analyzed temporally through the examination of connectance, which realized the proportion of possible interactions between fleas and small mammals species (4).We tested for a nested arrangement of the networks using the quantitative method weighted NODF (5).Network robustness (6) were estimated by quantifying the weighted loss of interactions between fleas after randomly removing species of small mammals, and vice versa.We searched for groups of species that interact more inside the group than outside using the Q modularity metrics (7).Due to variations in network size, modularity and nestedness were standardized to z-scores.This involved comparing observed values minus the mean value estimated by null models, divided by the standard deviation of null models, using the vaznull null model method (8).
We anticipated higher connectance and link density during epidemic years, potentially amplifying transmission probabilities.Furthermore, the presence of generalist and abundant species was expected to boost nestedness values during epidemics, thereby promoting plague transmission within the network.We hypothesized that more modular networks, by limiting connections into distinct groups, would reduce transmission flow during low activity years.
Additionally, we expected higher network robustness in epidemic years, as they maintain more connections even after species loss.When a species was removed from epidemic networks, it was expected that other generalist species would compensate for the interactions without losing connections within the subset.The network's structural metrics were correlated with the epidemiologic status of the year (Epidemic >5 human plague cases; non-epidemic ≤5 human plague cases) to test for the impact of network properties on the outbreaks.

Statistical Analysis of Eco-Epidemiologic Data
Comparison between the epidemiologic status of the year with continuous variables was performed by linear regression.The R-squared for the rodents' capture success were calculated using an exponential (one-phase decay) nonlinear regression.Principal component analysis (PCA) was calculated selecting PCs based on eigenvalues method.Regressions, PCA and Receiver Operating Characteristic (ROC) curves were calculated using GraphPad Prism v10.1 (Dotmatics, Boston, USA).The Sankey diagram was designed using the Sankey Diagram Generator (http://sankey-diagram-generator.acquireprocure.com).Statistic tests were run using a 95% confidence interval and considered significant when p<0.05.

Linear Discriminant Analysis
To test the contribution of variables to the prediction of plague epidemics by year, the years were assigned to groups according to the results from the PCA analysis: epidemic years (when human cases >5) and non-epidemic years (when human cases ≤5).We carried a linear discriminant analysis (LDA) with cross validation with different combinations of the epidemiologic variables (Rodent capture success, Flea positivity, Rodent positivity, Flea index) and network variables (modularity, vector robustness, host robustness).The discriminant analyses were carried only with the epidemiologic variables and with all combinations of network variables.A contingency table was obtained based on the different combination of variables which years were classified as epidemic and not epidemic as well as the probability of classification of each year.

Whole-Genome Sequencing
The Y. pestis cultures isolated in the Araripe Plateau are maintained in the cultures collection (Fiocruz/CYP) of the SRP from the IAM.In previous studies, 407 cultures were recovered and their genomes sequenced and analyzed (9,10).Here we revived and sequenced 33 additional strains that were originally contaminated, using the CYP broth medium (11).
DNA extraction was carried out using the DNeasy Blood & Tissue Kit (Qiagen) in accordance with the manufacturer's recommendations.Subsequently, the extracted DNA was quantified using the Qubit® 4 Fluorometer and the Qubit dsDNA High Sensitivity Kit (Thermo Fisher Scientific Inc.).Genomic library preparation was conducted using the Nextera XT Library Preparation Protocol (Illumina Inc.).Subsequently, library quantification and normalization were conducted using the ProNex® NGS Library Quant Kit (Promega) through Quantitative Real-Time PCR (qPCR) for standardized sample input using the 7500 Fast Real-Time PCR System (Applied Biosystems).After quantification, the samples were diluted to 19 pmol in ultrapure water and transferred to a single microtube containing Hybridization Buffer HT1 (Illumina Inc.).Subsequently, they were denatured at 96°C for 2 minutes in a thermocycler and a total of 600 µL of the denatured samples was loaded into a MiSeq Reagent Kits v3 (Illumina Inc.).

Genome Assembly, Annotation, and Core Genome
The quality of the sequencing data of the 439 Brazilian strains of Y. pestis was evaluated using FastQC 0.11.8 and the results were grouped by MultiQC v1.10 as previously described (12,13).The sequencing metrics and metadata for the 33 newly sequenced Brazilian strains are available in Appendix 2 Tables 4, 5. Sequencing data was filtered using Trimmomatic v. 0.38 (14) and the assembly was performed using the VelvetOptimiser.To perform gene predictions and functional annotations of the newly assembled genomes, the Prokka pipeline (15) was used as described by Pitta et al. (9).GenomeTools 1.5.8 (16) was used to evaluate the annotations performed.

SNV Calling and SNV Core Genome
We conducted SNV calling with the Snippy software (17) and the CO92 strain was used as a reference genome.The results obtained were subsequently employed to generate a core SNV alignment, which served as the basis for the subsequent steps involved in constructing a phylogenetic analysis.Given the substantial volume of genomes, the 'snippy-multi' script (an integral component of the Snippy package) was employed for the automated analysis of all genomes in comparison to the reference genome.

Phylogenetic Analyses
The alignments were used as input for IQ-TREE2 (18) to perform phylogenetic analyses based on the core SNV profile with a bootstrap value of 1,000 and the ultrabootstrap approximation technique to enhance computational efficiency and reduce analysis time.
Additionally, the ModelFinder tool (19), integrated into IQ-TREE2, was employed to automatically assess the best substitution model for the phylogenetic analysis.The results were visualized with the iTOL (Interactive Tree of Life) web platform (20).

NextStrain Dataset
We retrieved from the NextStrain dataset 538 Y. pestis genomes (filters: samples dated later than 1900 and not from Brazil).The metadata from the web service was subsequently downloaded, and based on the bioproject associated with each sample, a list for the download of these genomes was generated through the NCBI web service.Out of the 538 genomes, 64 are no longer available on NCBI; therefore, a total of 474 Y. pestis genomes were successfully downloaded (Appendix 3).A variable regarding the geographic focus was created for the phylogenetic analysis.

Characterization of the Phylogenetic Subgroups SNVs
Based on the variant call file (VCF) generated by Snippy during the core SNP analysis and the phylogeny constructed from this analysis, we identified which the apomorphic SNVs for each of the phylogenetic subgroups deriving from the "Araripe ancestor" group.Subsequently, SnpEff was used to predict the functional impacts of these variants in the highlighted samples, focusing on changes such as amino acid substitutions.The mutation list is available in Appendix 2 Table 1.